home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 26
/
Cream of the Crop 26.iso
/
database
/
dg53.zip
/
DGMAN
< prev
next >
Wrap
Text File
|
1997-05-26
|
49KB
|
1,126 lines
Release 53 Last change: 970506 Last Document Update: 970506
DG USER COMMANDS DG
*
NAME
dg - Data-Grep. Like grep but for searching a free-form
flatfile database, printing the entire records rather than
just the lines containing the searched-for phrase.
Pronounced "dig" as in "digging out data."
*
SYNOPSIS
dg [-options] srchstrng infile
srchstring is optional (& ignored) with -s, -l, -U options
srchstring is optional (not ignored) with -a, -A options
srchstring may be multiple with -m, -M, -y options
srchstring is disallowed with the numeric (-n) option
srchstring may be a file- list-of-searchterms - with -f option
srchstring may be a file- list-of-updates - with -B option
infile may not be wildcarded unless using dgw batch file (below)
*
DESCRIPTION
dg will search a text file for a given phrase and print all "records"
containing that phrase to standard output.
dg is intended for free form "flat" files of text containing records
(multi-line chunks or "paragraphs") separated by a defined delimiter
character (default is "*"). The delimiter character must occur at the
beginning of a line (but see options). Normally, no useful data should
be present ON the delimiter line, as it would be lost on output except
under certain options.
A "paragraph" mode (-dd option) treats blank lines as record
delimiters.
Unlike grep, which would report only the specific lines containing the
search term (or a fixed number of lines either side of the find),
data-grepper will print the entire record in which the search term was
found-- or a specified number of key lines for that record.
The records of the data file are read in order, and records with
"hits" are sent to standard output. Maximum line length is 200
(500 in UNIX versions). Overlong lines on input are tolerated,
but split to meet max line length. dg has no limitations as to
record length, but the price paid is that it cannot accept standard
input from a pipe (the input file is opened twice).
The program does not directly support wildcards, nor does it
understand all unix "regular expressions." A wildcarded list of files
to search may be done using the dgw batch file.
The searchstring is normally used literally on the command line, or,
by using the -f option, up to 100 searchstrings may be specified in a
separate file (100-DOS, 1000-UNIX). Searchstring length is the same
as maximum linelength except when using the -f option, where
searchstring length is limited to 20.
dg searchterm file_to_search
dg -f file_of_searchterms file_to_search
If the searchterm contains spaces, the searchterm must be enclosed in
single (UNIX) or double (DOS) quotes when used on the command line.
Depending on options selected, the first 1-9 lines of each record may
be treated as key lines. Either the search or the report or both may
be limited to these lines.
dg -k3K5 searchterm file_to_search
will look for searchterm only in the first 3 lines of
each record, and print the first 5 lines of records
when a match is found.
The output normally retains the "*" delimiters, thus becoming a subset
of the original data file, ready for further searches.
Other single characters can be used as delimiters by specifying a new
delimiter on the command line.
dg -d# searchterm file_to_search
will expect the # sign as record delimiter.
The special case ("paragraph mode"):
dg -dd searchterm file_to_search
will treat blank lines as record delimiters.
There is an option for additional, secondary sub-delimiters (which may
be blank lines) if your data records are large enough to warrant this.
Null records, those with only newlines between successive delimiter
lines, are ignored and will not be present on output.
Records with whitespace (spaces, tabs) are not treated as nulls.
The original delimiter lines are normally a single "*" character, but
may contain dashes or text following the delimiter character. The
extra characters are treated as "comments" NOT to be printed on output
(unless the -R option is named). Indeed, a whole series of delimiter-
prefixed lines may be included in the master file as "comments" or
documentation, not to be printed on output.
---------------------------------------------------------------------------
*
OPTIONS (in order of average overall usefulness)
-C Case sensitive (default is case insensitive).
-c Count only. Report the number of records
containing the search phrase.
-v inVert sense. Report records not containing
the search phrase.
-d$ Use $ or other following char as Delimiter
Exception: Use -dd (yes - lower case d repeated)
and the system will treat a blank line as the
delimiter for search (sort of like considering
paragraphs as records). Output will, however,
insert the standard "*" delimiter.
-dd Paragraph mode. Blank lines are record Delimiters.
True blank lines only- no spaces or tabs.
See -d above.
-k[n] Search in Keyword-lines-only. Declare the first n (1-9)
lines as keyword-lines. Default 1st line only.
-K[n] Limit output report to Keyword-lines-only.
The first n (1-9) lines are Keyword-lines.
Default 1st line only. n for -k and -K
options limited to 1 digit. They are
independent, and can be used together.
-s Status of data file: give record count only.
If used with -V option, reports misc. file data.
Ignores null records. (Reports them if verbose).
(overrides other options on the command line)
-l Treat aLL records as hits. No searchterm needed.
Useful with -D, -a, -A, -K options.
dg -Kpl will give an undelimited list of
first (key) lines. dg -Kplh# gives a similar
list of major/minor keylines for files with
major (*) records and sub(#)records.
Note- Other uses of -l and -h together are
not recommended.
-w Match only on Words. (phrase bound by spaces
or line boundary or any non-alpha, non-digit)
Underscore (_) is treated as part of a word.
-u Like -w but Underscore (_) is treated as a
word delimiter (as if whitespace) as well.
-x Search phrase is found even if it crosses over
a line boundary (X-over). One-line crossover
only. Ignores trailing but not leading
spaces on lines. Best when used with -T
to ignore leading spaces too. Note: trailing
hyphens are also ignored so that normal word
hyphenation is dealt with.
-f Get searchstrings from File.
Use filename to replace search phrase on the
command line. Leading and trailing spaces in
the file of phrases are stripped. For DOS, number of
searchstrings in the file is limited to 100
*and* a 20 character string size limit is imposed.
Finds are reported in the order they occur in the
data file, not the order of the file of terms.
(use batch files/unix scripts if you must extract
records in other than data file order.)
-F[n] The searchterm must be found in Field n of a
line to be considered a hit. Incompatible
with -mMxnUD and ^$ usage. A field of a line
is defined as in a default awk usage-- words or
terms separated by whitespace, with leading/trailing
whitespace ignored. Use -F with no numerics to
indicate the last field of a line regardless of
the number of fields there. See extended discussion
below.
-L Affects -F option. Lax enforcement of field numbers
and lengths. See extended discussion below.
-m[n] Expect n Multiple search terms on the command line,
each of which must be present on_a_single_line in a
record to cause a find. If n is omitted, n=2.
Incompatible with -v. Max n is 9. If the searchterms
are identical, 1 hit suffices. If used with -x,
finds must be within about 1 line of each other.
-M[n] Expect n Multiple search terms on the command line,
each of which must be present somewhere_in_the_record
to cause a find. If n is omitted, n=2.
Incompatible with -v. Max n is 9.
-E[n] Look for Extras-- expect 1 search term on the command
line, and report records having that term on at
least n separate lines. If n omitted, n=2.
Incompatible with -v. Looks for Extras.
-p Plain output. Do not print the delimiter on output.
Exception: with -y, kills only the sub-separator line.
-e Exact whole-line match required to cause a find.
-T Ignore L & R (lead/Trail) spaces on all lines.
Useful with -e or -x
-Q Quit on first find of term; on first find of *each* term
when used with -f. Useful with files that redundantly
repeat records, e.g. expanded procedural flows. If more
than one -f term is found in a record, all are satisfied
by printing that record. Do not confuse this with
-m or -M searches. The -Q option then will quit on the
first find satisfying the -m or -M condition.
-W Print only the record numbers where the finds occur.
("Which" records?)
-h$ Use $ or other following char as an added, secondary
"Helper" delimiter. The secondary delimiter will be
recognized whether in the first or second position
on a line. Output will be preceded by the first
line of the main record, and the phrase: "PARTIAL
RECORD:" Not compatible with -x option.
Exception: Use -hh or terminal -h with no character
specified, and the system will treat a virtual blank
line (true blank lines, or lines with only spaces/tabs)
as a secondary delimiter for search (sort of like
considering paragraphs as sub-records within explicitly
delimited records). Output will, however, insert the
standard "*" delimiter.
Example: dg -hhCF1 -h dgman
will give help on the -h option of dg.
-Dfname Divide(distribute) output:
Write records found to files fname0001, fname0002...
one file for each find. Supported ONLY as last option
in the option list. Limited to 9999 output files.
-n#[...] Get record by Number, e.g. -n456 = get 456th record
Compatible ONLY with -vKqod$... NOT with -aA
Null records are ignored when counting.
Supported ONLY as last option in the option list.
A syntax of -n#[#####],#[#####] is supported to retrieve
a range of record numbers. Particularly useful
when a large file must be divided.
-a Print whole data file, Append contents of zzapfile
to finds. See discussion below: UPDATING RECORD STATUS
-A Print whole data file, Append zzapfile line 1 to
keyline 1 of finds. See discussion below: UPDATING
RECORD STATUS
-j Affects -a, -A options- don't print whole
file, but Just the records with finds.
-J Affects -a, -A options- tacks a "Jumped" record
number onto "found" records.
-r Print the delimiter followed by dashes (like a
Ruler line) to enhance visual separation of records.
-R Retain content of original delimiter lines.
The default is to drop additional characters
following the delimiter. (The default permits
the delimiter line to contain "private"
file documentation.)
-B Fold-in Big data updates. Allows automated updates
of large record sets based on a file of update
directions. Highly useful but only in limited
circumstances. See discussion below.
-U Uniqify a set of records. Directs deletion of
repeated records based solely on the last field of
the first keyline. Limited filesizes except in
UNIX versions. See extended discussion below.
-V Verbose. Show prefatory/summary remarks. Use
with -s for datafile status report. Use -Vq with
dgw batch file to record filenames searched.
-H Emphasize the line in the record where the
search conditions were met. Prints markers
(happy faces if in DOS) at beginning of the
"Highlighted" line. Seldom needed, but can be
helpful when individual records are long.
-N Print a Negative message if no records are found.
Normally, there is no output when there are no
finds.
-o Null argument. Does nOthing. Useful from
some batch files/scripts.
-G A Grep-like option. Only the lines with the
match are printed. Use only if a real grep is
unavailable. No REGEXP, but usable with the
following options:
-w, -u, -c, -v, -T, -C, -e, -f, -m, -F, -N, ^$
Not usable with -k,-K,-x,-D,-Q,-y
nor with most other options that are record-oriented.
Inappropriate options are not all trapped, but
generally have no effect.
-y The grep-rest option. Unrelated to -G.
"digs" for a record, Yet greps it too.
Usable with -K such that IF a record is
a "hit" the -K keylines are printed, and
followed by any remaining lines in that record
that contain one of a set of other searchterms.
I.E.- print the keylines of finds and grep the
rest of the record for other searchterms.
The -m option and syntax must be used, but the
the FIRST term given in -m syntax becomes the
SOLE record searchterm and all OTHER -m terms
become what we grep for after the keylines.
Example:
dg -ykK2m3r gold melt boil elements
will print the 1st 2 keylines of records in the
file "elements" having "gold" in the 1st keyline,
and then print any remaining lines in the record
having the terms "melt" or "boil". Use with
the -r option for best visual separation of
resultant records.
-I Ignore delimiter if repeated in place 2.
i.e., if a line begins with ** then
Treat it as just a text line, not a delimiter line.
Useful with certain originals when you don't want
to clean them up first.
-S Add delimiters (Stars) to a file. A delimiter line is
added _before_ each line containing the search term.
Use -Sf and a file of searchterms when appropriate.
-P Add delimiters (Post-stars) to a file. Like -S,
but delimiters are added _following_ each line
that is a hit.
-q Quiet. No extraneous prefatory/summary remarks
(default, but retained for historical reasons).
Exception: Use -Vq with dgw batch file to record
filenames searched.
-i[n] Recognize an Indented delimiter anywhere in the first
n characters (1-9) of a line. Useful in delimiting
code files when the delimiter must reside inside
a comment, e.g.,
/* (c) , //* (c++) , #* (unix), ;* (lisp) , REM * (dos)
Especially useful with -T to kill leading whitespace
for files that have extensive indentation schemes.
Thus a -Ti option allows #* to work with any amount
of leading whitespace.
-Z[Z][1] FuZzy searches-- Look for approximate matches.
The -Z option uses a SOUNDEX algorithm that assumes
the first letter of every word is unfuzzy. Use
the -ZZ option to fuzz even the first letter,
e.g., batter with a searchterm of "patter", but
expect lots of false hits. A Z1 option uses a stem
algorithm that might find "silliness" when you
search for "silly". All three fuzzy approaches
are desperation moves, sometimes laughable.
You may need the -H option to figure out which
line caused the hit. Expect "fuzzy" to be more
like "hairy" or even "wooly" most of the time.
The SOUNDEX approach is an old classic, which gives
decent results when you must search with names or
commonly misspelled words such as nuclear and
personnel, but expect lots of extra drivel as well.
Only the first few syllables are checked. If you're
curious, you can inspect the kind of coding produced
for any searchterm by adding a -N option using a
file you know will NOT produce a match. The "not
found" report will show the soundex or stem code of
the searchterm. Alternatively, add a -V verbose
option and wade through the whole mess.
Expect junk results if you use small searchterms,
numeric searchterms, or searchterms that include
spaces or punctuation. The -w option is disallowed.
Although only words are really treated, there can be
no guarantee of a true wordmatch. The -e option is
allowed, but a hit indicates exactness only in the
coding string, not in the actual text. All fuzzy
searches are automatically case-insensitive.
-O Show PrOgress-- when working very large files,
print some sign of life every 1000 lines
to screen only.
^$ These are not command line options, but implied
options nonetheless. Though full unix regular
expressions are not supported, the ^ and $
expressions are:
dg ^foo filename
means look for "foo" at the beginning of a line.
Similarly:
foo$ means foo at the end of a line
\^foo means search for literal "^foo"
foo\$ means search for literal "foo$"
Note that a search for ^RAT$ is designed to
succeed on "RATCELLAR WITH RAT"
Use -e for the unix sense of ^RAT$
where the intent is SOL-phrase-EOL.
---------------------------------------------------------------------------
*
USAGE: General
This utility is not designed to replace full featured databases with
formal query languages. It is suitable for keeping utility files, such
as address or contact files or software requirements files, when the
purpose of the search is not to settle just for individual lines
containing the desired phrase, but to get the entire paragraph or
record. It is like grep with some notion of context.
It is useful from the command line, but most powerful when used in
batch files that grab a set of records and then do further processing
on them.
While dg is oriented to asterisk-delimited text files, any single-
character delimiter can be used, including blank lines.
Given a data file of simple paragraphs separated by blank lines, dg
can behave as if the blank lines were the "asterisk" delimiters:
dg -dd searchterm datafile
The output will be asterisk-delimited, unless you add the -p (plain)
option. The blank lines must not have hidden spaces or tabs, unless
you use the -T option (trim lead/trail spaces/tabs) option as well.
----------------------------------------------------------------------
*
USAGE: Null Records
Null Records:
A record is "null" if it has no bytes or only line-ending
bytes. Null records are ignored for output, and when
counting to find an Nth record.
Null Keyword lines.
The -sV option will report records that have no data on
keyword lines.
----------------------------------------------------------------------
*
USAGE: With awk and grep in scripts
dg it was originally designed to work in concert with awk in scripts
or batch files-- working against initially unformatted text files.
An aside: if you are not familiar with awk, you are
missing one of the best tools available for manipulating
text files. Get a copy of Rob Duff's awk or the GNU
gawk for DOS. It has almost all the power of PERL,
but when you read an awk scipt six months after you've
written it, you'll understand it. PERL is best only for
folks who will use it every day.
Given a master file of records without explicit delimiters, an easily
designed awk script can place delimiters at appropriate places in a
temporary copy of the original file using either a simple or fairly
sophisticated set of guidelines. dg is then used to do searches on
the temporary file. If the master is updated, the awk script is re-
run to update the temporary file.
The dg -S or -P option can be used instead of an awk for very simple
cases. Or-- if you simply want to trade blank lines for "*" delimiters,
use a -ddl option; the output will have "*" delimiters where the blank
lines were.
If your primary data is in a commercial database, you may find it
useful to dump a subset of the database to a delimited ASCII file.
Then, for the rest of the day, you can dig at it with dg, directly or
from scripts, without needing to keep the (potentially memory-hogging
or licensed-user-limited) database software running.
----------------------------------------------------------------------
*
USAGE: If records have labeled lines
dg is powerful when used with grep against data files designed to have
a number of labeled lines or "slots" in each record.
With a file such as:
NAME: John Jones
PHONE: 999-9999
UNIT: T-44
EXPER: C, C++, aerodynamics
ASSIGN: Rufus GUI modules
DUE: JAN 95
*
NAME: Jane Smith
PHONE: 999-8888
UNIT: T-55
EXPER: LISP, scheduling, traffic flow, NL
ASSIGN: Rufus NL interface
DUE: FEB 95
*
a command line or script call such as:
dg smith filename | grep EXPER
would yield:
EXPER: LISP, scheduling, traffic flow, NL
or--
dg -ykKm3 smith exper assign filename
would yield:
NAME: Jane Smith
...................
EXPER: LISP, scheduling, traffic flow, NL
ASSIGN: Rufus NL interface
whereas:
dg smith filename
alone would print smith's entire record.
----------------------------------------------------------------------
*
USAGE: AND searches
One can use a script that searches for records with the first term,
redirecting output to a temporary file-- which is then searched for
records with the second term
dg phrase1 filename > temp
dg phrase2 temp
Alternatively, use the -M option:
dg -M phrase1 phrase2 filename
dg -M4 phrase1 phrase2 phrase3 phrase4 filename
If the search is intended to "AND" multiple phrases on a _single_ line,
use the -m option. This is particularly useful when, e.g., you want
to find records containing "DEC" but only if the "DEC" is on a
HARDWARE line and not on a MONTH or DATE line.
dg -m DEC HARDWARE filename
----------------------------------------------------------------------
*
USAGE: OR searches
Put the set of searchterms if a file and use the -f option.
----------------------------------------------------------------------
*
USAGE: Searching Multiple Files
There is no provision for wildcards in the datafile name.
Each datafile must be searched individually.
(Use awk to create a script that calls dg against
each of a list of datafiles.)
Alternatively, use the following batch file or its unix equivalent.
Note that the results are always written to a file named ztempx,
in the current working directory. The batch file will complain if
you try to search ALL (*.* or *) files in the current working
directory since that would include the output ztempx file.
If you must search ALL files in a directory (* or *.*), do so from
a higher level directory. dgw -o srchterm asubdir/*.*
will work fine.
The dgw output will not name the file where finds are found,
unless you include a -Vq option. If you do that, dg will
insert a record naming each file searched. To clean out those
advisories, just use dg again with dg -kv "Searching file:" ztempx
to get a "clean" set of results.
---------------------cut here -----------------------------------
@ECHO OFF
rem bat file to use dg with wildcarded list-of-files-to-search
rem usage dgw -dg_arguments searchterm [*.txt or ad.* etc.]
rem note: a dg argument must be given, at least -o (a do-nothing)
rem note: always writes result to file named ztempx (and sends to more)
rem note: thus ztempx must not be in the scope of the wildcard
IF "%1" == "" GOTO helps
IF "%3" == "" GOTO error
ECHO ======Executing the command dg %1 %2 %3 %4 %5 %6
ECHO ======Will overwrite ztempx
ECHO ======RETURN to continue, ctrl-C to quit
PAUSE
IF EXIST ztempx DEL ztempx>NUL
rem if touch program not available, use: @REM redirect_to ztempx
rem touch ztempx
@REM >ztempx
rem current setup for 6 total args: supports, e.g., up to -m3
rem its possible to send 11 arguments with, e.g., -m9
rem expand below to handle 11 if desired
if exist %6 goto six
if exist %5 goto five
if exist %4 goto four
if exist %3 goto three
:six
FOR %%X IN (%6) do if %%X==ZTEMPX goto scope
FOR %%X IN (%6) DO COMMAND/C dg %1 %2 %3 %4 %5 %%X >> ztempx
goto didsearch
:five
FOR %%X IN (%5) do if %%X==ZTEMPX goto scope
FOR %%X IN (%5) DO COMMAND/C dg %1 %2 %3 %4 %%X >> ztempx
goto didsearch
:four
FOR %%X IN (%4) do if %%X==ZTEMPX goto scope
FOR %%X IN (%4) DO COMMAND/C dg %1 %2 %3 %%X >> ztempx
goto didsearch
:three
FOR %%X IN (%3) do if %%X==ZTEMPX goto scope
FOR %%X IN (%3) DO COMMAND/C dg %1 %2 %%X >> ztempx
goto didsearch
:didsearch
echo ================ FINDS: ===================================
TYPE ztempx | more
ECHO =========== Finds placed in file ztempx ===================
GOTO end
:scope
echo ERROR- the wildcard term includes the output file "ztempx"
goto paterror
:error
ECHO dgw error.
:helps
ECHO dgw is used to do a dg-search against a wildcard list-of-files.
ECHO e.g. " dgw -Kp searchterm *.foo "
ECHO A dg argument must be used. Use -o for a do-nothing argument.
ECHO e.g. " dgw -o searchterm *.txt "
:paterror
ECHO The output of each search is written to file "ztempx"
ECHO Be sure that the wildcard term cannot "see" the file tempx
ECHO NAME.* or *.NAM is ok. But be in a separate directory to use * or *.*
ECHO e.g., NOT "dgw -o searchterm *.*" NOR "dgw -o searchterm *"
ECHO e.g., BUT "dgw -o searchterm subdir/*.*" will work.
end
---------------------cut here -----------------------------------
----------------------------------------------------------------------
*
USAGE: Creating Tailored Data Sets from A Master
I need to maintain a large set of test datasets. For the actual test,
each must be an individual file, but maintenance is much easier if
they are all kept in a single master file. Each message is delimited.
At test run, a script executes dg with the -D option, creating the
individual files of the targeted datasets, before executing the actual
tests that will act on the individual files.
All datasets include one or more keywords such as "full", others
"fullminus", and others "specialcase4"; the keywords indicate the
class of test. Depending on need, a dg for the desired keyword
produces the tailored test set files.
----------------------------------------------------------------------
*
USAGE: Understanding the Multiple Terms Options ( -m, -M, -E, -y )
These options can be confusing, but each has been a lifesaver at one
time or another. These examples may help:
dg -m foo fum filename - a hit if foo & fum on a single line
dg -m3 foo fum fay filename - a hit if all 3 on a single line
dg -M3 foo fum fay filename - a hit if all 3 anywhere in a record
dg -E3 foo filename - a hit if foo is on at least
3 separate lines in a record
The following are not useful searches, but they help explain the
behavior when searchterms overlap:
dg -M3 foo foo foo filename - will succeed if 1 foo in a record
dg -m3 foo foo foo filename - will succeed if 1 foo in a line
The -y option needs an assist from the -m option in meeting the
command line syntax, but the meaning of terms and behavior are very
different. Also, the -y option may be used only with a -K option.
dg -yKm3 foo fum fay filename
For this case, "foo" becomes the sole searchterm determining
whether a record is a hit. For such records, the first keyline
is printed (-K), and for the remainder of the record, any lines
containing "fum" or "fay" will be printed. The "m" in the options
is used only to bring in the searchterms and then its "normal" meaning
is ignored.
----------------------------------------------------------------------
*
USAGE: Eliminating Dupes In Multiply Appended Results Files.
The -U option is generally useful only if you intend to search a large
set of records several times, appending each result to a collection
file. Naturally this kind of job can result in a final file with
quite a few duplicate records.
To avoid this, first run dg against the master file (or a copy therof)
with an -aJ option to append a record number to the first keyline of
each master record.
Then run your multiple searches against this modified master,
appending the results of all searches to the collection file.
Finally, run dg with a -U option to create a uniq'd final version.
This option needs to build an array of last-fields "already seen." To
limit memory poblems in DOS, no one "last-field" may exceed 10
characters in length, and the total record size to be culled may not
exceed 200 records. A simple awk can do the job for tougher cases.
----------------------------------------------------------------------
*
USAGE: The -F Field Option
The field option is only rarely of use, but very powerful when needed.
This option allows you to limit search actions to specific fields of a
line. E.G., consider the command:
dg -F12k3 Elizabeth records
The F12 indicates that only field 12 of any line should be searched for
the searchterm. The k3 would further limit the search to only the
first 3 lines of any record.
A field is defined like a default "field" in awk-- words or terms
separated by whitespace, with leading/trailing whitespace ignored.
The -F option requires some limitations on the the maximum field
length and maximum number of fields per line. For DOS, these limits
are 40 and 20. That is, no one field with a length over 40, nor more
than 20 "words" in any one line.
The option is designed primarily for files in which ALL lines stay
within these limits. Any field exceeding the max length will be
truncated to "fit" and a warning posted to the screen. Any one line
exceeding the max number of fields will cause an error warning and the
program will terminate.
This behavior can help detect unintended errors in the way the data
file was created-- if it was your intent to stay within the limits
given.
For other cases, you may intend that only certain lines will be
"fielded" lines, and others should not be restricted. Use the -L
"lax" option to kill the complaints (-LF). Any field past the 20th will
just be ignored. Fields exceeding max length are quietly truncated.
If -F is used with no numeric attached, the program assumes you intend
to search the LAST field of the line. Behavior for this special case
will be correct even if the normal max number of fields is exceeded.
Total line length limits will still apply.
----------------------------------------------------------------------
*
USAGE: Updating Record Status:
If there is a need to append one or more new lines to selected records
in a master file:
-- Put the append text in a file named zzapfile
-- Run dg with the -a option.
-- The entire file will be sent to stdout with the
append text appended to records matching the
search text.
If there is a need to append a phrase to the main key-word line of
selected records:
-- As above, but use the -A option.
-- The contents of line 1 of the zzapfile
will be appended to the 1st line of records
matching the search text.
Use the -J option along with the -a or -A options to append as
described, but inhibit printing of records that do not have a match.
----------------------------------------------------------------------
*
USAGE: Updating Records with the -B option
The -B option allows one to update certain kinds of record files from
a manually (or otherwise) produced update file. This is usable only
with files that use line names at the start of each line.
Such a file, the tgtfile, might hold records such as:
john smith
title: staff engineer
ssn: 999-999-9999
salary: 44444
hired: 960506
*
If one creates an update file, foldfile, such as:
ann smith ~salary: 45555
john smith ~salary: 77777
pat kelly ~salary: 33333
john smith ~ssn: 888-888-8888
Then the command:
dg -kB foldfile tgtfile > zz
would update john (and ann's and pat's) salaries, as well as john's
ssn, leaving other records untouched. If john did NOT have a salary
line, it would be appended as a new line in his record. You can update
multiple elements about john from a single update file. (The updated
results are in file zz; dg never changes the original record file.)
The option assumes you will use unique key terms that will be found
only once in the designated number of keylines. For example, an
update file such as
smith ~salary: 45555
john ~ssn: 777-777-7777
pat ~salary: 33333
will update john smith's ssn or his salary but not both. If "john
smith" were used instead, both lines would be updated.
This is important to understand, especially if you tell the search to
continue over more than one keyline (by using, e.g., -k3). If a valid
hit is found in, say, keyline 2, then actions will be taken based on
that hit-- and only based on the first hit in that line. If you
expected additional actions based on a second possible hit in keyline
2 -- or a separate hit in keyline 3-- you will be disappointed.
The search looks no further than the first hit.
The line title (e.g., ~salary) is always case sensitive. Use of the
-C option will make searches for the key term (e.g., john smith) also
case sensitive.
Note that line title in the update line should be identical to the
one that is used in the file of records if you want to preserve the
original line name. Otherwise the updated record will take on the
line title provided in the update file. For example, john ~STATUS OK
-- will find and replace "STATUS----: BAD", but the new line would be
"STATUS OK" not "STATUS----: OK".
Limitations: The file will look for "john" only in the keylines
specified. You must use the -k[] option with the -B to designate how
many keylines are to be searched. A maximum of 20 data elements about
john can be used in the file of updates. Data in keyline 1 cannot be
changed.
In general, don't try to overwork this option. Its fine for limited
cases. For more complex work, use awk.
----------------------------------------------------------------------
*
USAGE: Non-ASCII Documents
dg works ONLY with ASCII files. If you are keeping your master records
using document publication software, files saved will normally be
in other than pure ASCII. All need not be lost. Most such software
allows saving a pure ascii version as well.
I've had to keep quite a few documents in ready-to-publish form
using Framemaker or Interleaf. Whenever edits are made, I simply
create an extra, updated ASCII version as well, either directly or
using an awk to strip out the formatting and graphics in the extra
copy.
----------------------------------------------------------------------
*
USAGE: Help
Typing dg with no arguments provides some cryptic help.
Use the dgh.bat batch file or an equivalent unix script to get a bit better
help on a particular option. Replace the "\dg\dgman" with your own path to
the dgman file.
With this batch file in your path, enter:
dgh -y
to see the part of dgman that describes the -y option.
ALternatively, use the dghh.bat file (or equivalent) to get help using
a likely keyword:
dghh dash
will show help on the -r ruler line option. Replace the "\dg\dgops.sam"
with your own path to that file.
Note: The dgman and dgops.sam files contain embedded spaces on certain
apparently blank lines to keep certain help sections together when using
the -dd option. For example, see the help for -y.
----------------------------------------------------------------------
*
USAGE: Option Confusion
You can come up with a lot of different option combinations using dg.
When you get some combination that does what you want, put it in a
batch file or an alias command. Let the computer do the remembering.
The dg.bat file shown above is a good example of usage.
----------------------------------------------------------------------
*
BEHAVIOR: Treatment of Punctuation
DOS BEHAVIOR:
In a searchterm, <>| must be quoted. The ; and " symbols can be in
a searchterm only if using the -f option. A backslash (\) may be in a
searchterm but must not immediately precede a double quote ("). The %
symbol can be in a searchterm only if using the -f option or using the
command line directly; from a DOS batch file, the % symbol would be
lost.
UNIX BEHAVIOR:
Generally less silly. If you must include punctuation in a searchterm,
you may or may not need to use single quotes around the term. Experiment.
----------------------------------------------------------------------
*
BEHAVIOR: "WORDSEARCH" (-w)
A searchterm "hit" meets -w wordmatch criteria as long as the hit:
-- is bound on left by: SOL, non-alpha, non-digit, non-underscore
-- is bound on right by: EOL, non-alpha, non-digit, non-underscore
(except with -U option, where underscore IS treated as wordbreak)
A "word" bounded by punctuation remains a word. Thus !@#$wow#$&%
will qualify in a wordsearch for "wow". SOL means "Start of Line" and
EOL is the end.
ALSO-- digits or punctuation INSIDE the searchterm do not disqualify it
as a "word." For example, "walla6*%^walla7" can be a word "hit" for
the "word" "walla6*%^walla7" since wordmatch only checks the area left &
right of the "hit."
----------------------------------------------------------------------
*
BEHAVIOR: UNPRINTABLE CHARACTERS:
The program is not designed to deal with control characters nor high-
bit ASCII above 127 in the text nor the searchterms. Consider the
behavior unpredictable when these are present.
----------------------------------------------------------------------
*
EXAMPLES
--- Random Files:
The simplest example uses are for keeping randomly organized address
or contact records, system/software requirements statements, multi-
line quotations or references, mini-help files, to-do files, scheduled
appointments, descriptions of hobby collectibles, recipes, or simply
random ideas. Just separate all "data chunks" with a "*" delimiter.
--- Unordered ASCII Documents (Paragraph Mode):
Any documentation you reference often, or need to extract from, can be
dg-searched to get the right "paragraphs" to standard output, even if
the only delimiters are blank lines.
dg -dd phrase filename
Will provide all paragraphs holding the phrase, and
dg -vdd phrase filename
Will provide all paragraphs NOT holding the phrase.
--- Master Documents with Summary Lists (Advanced usage)
Often a master document of, say, software trouble reports, can be
delimited and then searched for related topics. The master may be a
manually-maintained one or the ASCII result of a database query.
Item: A22
Block: Control
Date: 940522
Short_Title: Foo fum
Description: This item is not performing correctly whenever
the input is made on a Tuesday before 2:15 PM. Behavior normal
at all other times.
Supply Data: Acme Co. and Ray-Bolixer Inc.
Priority: 3
Asssigned: Joe
Due 940610
*
Item: A33
etc.
*
Perhaps you've created a priorities listfile from it, that you use to
keep track of the big picture:
Item Block Date Short_Title Priority Assigned to Due
A22 Control 940522 Foo_fum 3 Joe 940610
A33 Charts 940527 Fum_fee 2 Jane 940616
A44 A-Object 940528 Fee_fie 4 Joe 940622
A55 Control 940530 Fo_fum 1 Dan 940604
Assuming the Block designation is in one of the first, say 3, keylines
of each record, try this:
gawk '$2=="Control"{print "dg -k3 " $1 " > temp2.bat"}' listfile > temp1.bat
call temp1 (creates temp2.bat)
call temp2
The awk creates a batch file of dg calls that will give you the
details on the Control block problems. Of course, if the records are
in a full-fledged database, you could query it directly. The dg
approach is primarily of value in batch files/scripts-- especially if
the data source is not worth entering or maintaining in a full fledged
database system.
----------------------------------------------------------------------
*
SEE ALSO
grep, awk, sed
----------------------------------------------------------------------
*
BUGS & LIMITATIONS
Max line length in input file: 200 (500 in unix versions)
Max searchstring length: 200 (500 in unix versions)
Max num of searchterms in -f searchfile: 100 (1000 in unix versions)
Max searchstring length with -f: 20 (500 in unix versions)
Max number of fields for -F options 20 (100 in unix versions)
Max field length for -x, -F options 40 (100 in unix versions)
Maximum number of records
when killing dupes: 200 (500 in unix versions)
Overlong lines on input are tolerated, but truncated as far as the
search is concerned.
Error management for insensible option combinations is provided only
for the most common mismatches.
LIMITATION: Use of a file of searchterms:
The -f option provides results only in the order that hits occur in
the original data file. Getting results in the order listed in the
list of search terms could be useful.
Use the following awk & script approach as a workaround:
Create the list of search terms as "srchtrms"
Use of dg -f srchterms datafile would give
results in the order of the terms found in datafile.
To get results in the order of terms in srchterms, use
awk -f thisawkfile srchterms > temp.bat
Run the resulting temp.bat file.
Results will be in "results.txt"
thisawkfile:
BEGIN{
#assuming the datafile is "datafile"
#and file of searchterms is "srchterms"
datafile= "datafile"
#following for dos, use 39 (') for unix
q = sprintf("%c",34)
print "del results.txt"
#unix only: print "touch results.txt"
}
#main
{print "dg -k " q $0 q " " datafile " >> results.txt"
#above commands are printed to temp.bat when
#called as awk -f thisawkfile srchterms > temp.bat
}
LIMITATION No Pipe capability TO dg:
The inability to accept input from a pipe can be annoying, but was a
tradeoff for efficiency. The program opens the input file twice using
the first file tag to do the searching, with the second tag playing a
"follower" role to print records that are "finds." This approach
avoids the need for large memory allocations, thus allowing unlimited
record lengths. Unfortunately stdin cannot be "opened twice," thus
piping the output of other commands to dg has been sacrificed to avoid
record size limitations. OUTput from dg can be redirected through a pipe.
Bugs:
Certainly. Let me know what you find.
-- Pete Marikle
----------------------------------------------------------------------
*